Search CORE

93 research outputs found

Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi

Author: E-J Im
J Mellor-Crummey
M Krotkiewski
R Nishtala
Publication venue
Publication date: 05/02/2013
Field of study

Intel Xeon Phi is a recently released high-performance coprocessor which features 61 cores each supporting 4 hardware threads with 512-bit wide SIMD registers achieving a peak theoretical performance of 1Tflop/s in double precision. Many scientific applications involve operations on large sparse matrices such as linear solvers, eigensolver, and graph mining algorithms. The core of most of these applications involves the multiplication of a large, sparse matrix with a dense vector (SpMV). In this paper, we investigate the performance of the Xeon Phi coprocessor for SpMV. We first provide a comprehensive introduction to this new architecture and analyze its peak performance with a number of micro benchmarks. Although the design of a Xeon Phi core is not much different than those of the cores in modern processors, its large number of cores and hyperthreading capability allow many application to saturate the available memory bandwidth, which is not the case for many cutting-edge processors. Yet, our performance studies show that it is the memory latency not the bandwidth which creates a bottleneck for SpMV on this architecture. Finally, our experiments show that Xeon Phi's sparse kernel performance is very promising and even better than that of cutting-edge general purpose processors and GPUs

arXiv.org e-Print Archive

Crossref

Exascale Algorithms for Generalized MPI_Comm_split

Author: A. Moody
B. Jenkins
E. Gabriel
J. Mellor-Crummey
P. Sack
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Abstract not provide

Crossref

UNT Digital Library

Impact of Corporate Governance Practices on Firm Capital Structure and Profitability: A Study of Selected Hotels and Restaurant Companies in Sri Lanka.

Author: Crowl L. A.
Dibble P. C. (1953 - )
Gafter N. M. (1960 - )
LeBlanc T. J.
Mellor-Crummey J. M. (1962 - )
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 29/08/2013
Field of study

Corporate governance issues have been a growing area of management research especially among large and listed firms. Good corporate governance practices are regarded as important in reducing risk for investors, attracting investment capital and improving the performance of companies. Companies need financial resources and better earnings to promote their objectives. Therefore, factorsmay affect the capital structure and profitability of companies should be considered carefully. The purpose of the present study is to investigate whether there is any relationship among some specific characters of corporate governance, capital structure and profitability of listedHotels &Restaurant companies in Colombo Stock Exchange (CSE). To do so, 18 companies were selected from those which were listed inCSE during the 2007-2012. The ‘Board Composition(BC)’, ‘Board Size (BS)’ and ‘CEOduality (CEOD)’ were considered as independent variables, whereas,’ Debt Ratio(DR)’,‘Debt-to-Equity Ratio(DER)’,‘Returns on Equity(ROE)’,and ‘Return on Assets(ROA)’ as dependent variable. The results indicate a positive relationship between ‘BS; BC; CEOD; ROE; ROA and DERwhereas negative relationship between BS; BID and DR.in addition CEOD have a positive relationship with DR.In addition, none of the variables have a significant relationship with capital structure and profitability. Key words: Corporate Governance; Capital Structure and Profitability

UR Research

International Institute for Science, Technology and Education (IISTE): E-Journals

A Separation Logic for Fictional Sequential Consistency

Author: E. Cohen
J. Alglave
J.M. Mellor-Crummey
K. Svendsen
P. Rocha Pinto da
S. Owens
T. Dinsdale-Young
T. Ridge
Publication venue: https://link.springer.com/chapter/10.1007/978-3-662-46669-8_30#enumeration
Publication date: 01/01/2015
Field of study

To improve performance, modern multiprocessors and pro- gramming languages typically implement relaxed memory models that do not require all processors/threads to observe memory operations in the same order. To relieve programmers from having to reason directly about these relaxed behaviors, languages often provide efficient synchro- nization primitives and concurrent data structures with stronger high- level guarantees about memory reorderings. For instance, locks usually ensure that when a thread acquires a lock, it can observe all memory operations of the releasing thread, prior to the release. When used cor- rectly, these synchronization primitives and data structures allow clients to recover a fiction of a sequentially consistent memory model. In this paper we propose a new proof system, iCAP-TSO, that captures this fiction formally, for a language with a TSO memory model. The logic supports reasoning about libraries that directly exploit the relaxed memory model to achieve maximum efficiency. When these libraries pro- vide sufficient guarantees, the logic hides the underlying complexity and admits standard separation logic rules for reasoning about their more high-level clients

Crossref

Apollo (Cambridge)

Barrier elision for production parallel programs

Author: Chabbi M
De Jong W
Iancu C
Lavrijsen W
Mellor-Crummey J
Sen K
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

Large scientific code bases are often composed of several layers of runtime libraries, implemented in multiple programming languages. In such situation, programmers often choose conservative synchronization patterns leading to suboptimal performance. In this paper, we present context-sensitive dynamic optimizations that elide barriers redundant during the program execution. In our technique, we perform data race detection alongside the program to identify redundant barriers in their calling contexts; after an initial learning, we start eliding all future instances of barriers occurring in the same calling context. We present an automatic on-the-fly optimization and a multi-pass guided optimization. We apply our techniques to NWChem - a 6 million line computational chemistry code written in C/C++/Fortran that uses several runtime libraries such as Global Arrays, ComEx, DMAPP, and MPI. Our technique elides a surprisingly high fraction of barriers (as many as 63%) in production runs. This redundancy elimination translates to application speedups as high as 14% on 2048 cores. Our techniques also provided valuable insight about the application behavior, later used by NWChem developers. Overall, we demonstrate the value of holistic context-sensitive analyses that consider the domain science in conjunction with the associated runtime software stack

Crossref

eScholarship - University of California

On the nature of progress

Author: G. Taubenfeld
H. Attiya
J. Aspnes
J. Mellor-Crummey
L. Lamport
M. Herlihy
M. Herlihy
M.P. Herlihy
N. Lynch
S. Heller
T.L. Harris
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

15th International Conference, OPODIS 2011, Toulouse, France, December 13-16, 2011. ProceedingsWe identify a simple relationship that unifies seemingly unrelated progress conditions ranging from the deadlock-free and starvation-free properties common to lock-based systems, to non-blocking conditions such as obstruction-freedom, lock-freedom, and wait-freedom. Properties can be classified along two dimensions based on the demands they make on the operating system scheduler. A gap in the classification reveals a new non-blocking progress condition, weaker than obstruction-freedom, which we call clash-freedom. The classification provides an intuitively-appealing explanation why programmers continue to devise data structures that mix both blocking and non-blocking progress conditions. It also explains why the wait-free property is a natural basis for the consensus hierarchy: a theory of shared-memory computation requires an independent progress condition, not one that makes demands of the operating system scheduler

CiteSeerX

DSpace@MIT

Crossref

Eraser

Author: BERSHAD B. N.
DINNING A.
DINNING BERG
Greg Nelson
LEE E. K.
MELLOR-CRUMMEY
MELLOR-CRUMMEY
Michael Burrows
Patrick Sobalvarro
PERKOVIC D.
SCALES D. J.
SRIVASTAVA A.
Stefan Savage
Thomas Anderson
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Pessimistic Software Lock-Elision

Author: A. Adl-Tabatabai
C. Fetzer
D. Dice
H. Attiya
H. Attiya
I. Keidar
J. Mellor-Crummey
M. Kapalka
M. Spear
T. Harris
T. Riegel
T. Shpeisman
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Read-write locks are one of the most prevalent lock forms in concurrent applications because they allow read accesses to locked code to proceed in parallel. However, they do not offer any parallelism between reads and writes. This paper introduces pessimistic lock-elision (PLE), a new approach for non-speculatively replacing read-write locks with pessimistic (i.e. non-aborting) software transactional code that allows read-write concurrency even for contended code and even if the code includes system calls. On systems with hardware transactional support, PLE will allow failed transactions, or ones that contain system calls, to preserve read-write concurrency. Our PLE algorithm is based on a novel encounter-order design of a fully pessimistic STM system that in a variety of benchmarks spanning from counters to trees, even when up to 40% of calls are mutating the locked structure, provides up to 5 times the performance of a state-of-the-art read-write lock.National Science Foundation (U.S.) (Grant 1217921

CiteSeerX

DSpace@MIT

Crossref

Efficient Symmetry Reduction and the Use of State Symmetries for Symbolic Model Checking

Author: A. Emerson
A. Miller
A. P. Sistla
A. Pnueli
A. Pnueli
Allen Emerson
Amir Pnueli
Angelo Montanari
C. N. Ip
C. N. Ip
Christian Appold
Christian Appold
E. A. Emerson
E. A. Emerson
E. M. Clarke
E. M. Clarke
E.A. Emerson
F. Somenzi
G. L. Peterson
I.-H. Moon
J. M. Mellor-Crummey
J. R. Burch
J.-P. Queille
M. Ben-Ari
Margherita Napoli
Mimmo Parente
T. Wahl
V. Gyuris
Publication venue: 'Open Publishing Association'
Publication date: 01/06/2010
Field of study

One technique to reduce the state-space explosion problem in temporal logic model checking is symmetry reduction. The combination of symmetry reduction and symbolic model checking by using BDDs suffered a long time from the prohibitively large BDD for the orbit relation. Dynamic symmetry reduction calculates representatives of equivalence classes of states dynamically and thus avoids the construction of the orbit relation. In this paper, we present a new efficient model checking algorithm based on dynamic symmetry reduction. Our experiments show that the algorithm is very fast and allows the verification of larger systems. We additionally implemented the use of state symmetries for symbolic symmetry reduction. To our knowledge we are the first who investigated state symmetries in combination with BDD based symbolic model checking

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Efficient Data Race Detection for Async-Finish Parallelism

Author: C. Flanagan
C. Sadowski
D. Lea
D. Leijen
E.A. Lee
J. Mellor-Crummey
J.-D. Choi
J.K. Lee
M. Feng
R. Barik
R. Barik
R.D. Blumofe
S. Agarwal
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Abstract. A major productivity hurdle for parallel programming is the presence of data races. Data races can lead to all kinds of harmful program behaviors, includ-ing determinism violations and corrupted memory. However, runtime overheads of current dynamic data race detectors are still prohibitively large (often incurring slowdowns of 10 × or larger) for use in mainstream software development. In this paper, we present an efficient dynamic race detector algorithm targeting the async-finish task-parallel parallel programming model. The async and finish constructs are at the core of languages such as X10 and Habanero Java (HJ). These constructs generalize the spawn-sync constructs used in Cilk, while still ensuring that all computation graphs are deadlock-free. We have implemented our algorithm in a tool called TASKCHECKER and eval-uated it on a suite of 12 benchmarks. To reduce overhead of the dynamic analysis, we have also implemented various static optimizations in the tool. Our experi-mental results indicate that our approach performs well in practice, incurring an average slowdown of 3.05 × compared to a serial execution in the optimized case.

CiteSeerX

Crossref